-
-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thinking support for claude-3-7-sonnet #443
Conversation
The chat is working so well for me. However if I enable reasoning mode, then the title and tag generation fails |
Should reasoning be enabled for title & tag generation? |
The issue is caused by content[0] in the synchronous call being the thinking block instead of the text block. |
After I tried this: def get_completion(self, payload: dict) -> str:
response = requests.post(self.url, headers=self.headers, json=payload)
if response.status_code == 200:
res = response.json()
print(res)
if "content" in res and (res_content := res["content"]):
# Find the first content item that contains text
for content_item in res_content:
if content_item.get("type") == "text" and "text" in content_item and (text := content_item["text"]):
return text
return ""
else:
raise Exception(f"Error: {response.status_code} - {response.text}") then tag generation worked. But title generation did not work, I didn't even see the request for it somehow....? |
Does that pr allow setting the reasonning effort? Also does it work if i'm using openrouter to call claude? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend changing the valve to a boolean, you can see that it looks better in the UI when setting valves, it's a simple toggle instead of a string.
@thiswillbeyourgithub Yes, and no. There is a manifold to support reasoning on openrouter (https://github.com/rmarfil3/openwebui-openrouter-reasoning-tokens) but so far I haven't been able to make it work with Claude 3.7, it doesn't display the thinking. |
@arty-hlr made the requested changes to change from string to bool.
@roryeckel you were on the right track, I made the changes to only enable reasoning when streaming and now both title and tag generation are working. |
I think the amount of reasoning should not just be a "global" setting. You may want to change it on a per-query basis. The "reasoning_effort" parameter should be used for this. If reasoning is globally enabled in the pipeline, this parameter determines how much reasoning to apply (should be possible to also set it to 0/None to disable reasoning in a specific chat, etc) Since this parameter is a string, I would recommend supporting both integer values to specify the exact amount of reasoning tokens, as well as a few names for "breakpoints", for example: {
"none": None, # Don't enable thinking
"low": 1024, # Minimum budget
"medium": 4096,
"high": 16384,
"max": 32768,
} |
I definitely agree with the reasoning effort part. Perhaps instead we define how many tokens are low, medium, and high in the pipeline valves. Then, use the native Open WebUI reasoning effort to map into the token budget. |
Yeah that could work. |
I like this approach. Makes sense given the effort is logarithmic to the amount of budget tokens. |
@reasv Made the changes to remove global configuration of thinking and moved to use the reasoning effort param. |
Ready to be merged? |
@roryeckel I think it's just a case of odd UX. From my understanding, the "medium" text is just a placeholder example of what could potentially be added there and is does not take effect until an actual setting is made. It makes sense that the default is none given most models until recently had no concept of reasoning effort. |
@tjbck I'm okay with merging this for now and leaving configurable effort to budget tokens in a follow up PR. |
I just tested it, for me only low and none work as That aside, the "medium" as default value even though it's none is very confusing, I think people will just think it's broken, so I wouldn't merge in this state. |
My current opinion is that OpenWebUI's reasoning_effort was implemented without consideration for models that can reason, but DON'T by default. Remember, the reference implementation was O3 which HAS to reason, so the issues with "none" wouldn't have been caught |
@arty-hlr the problem is likely because medium and high are greater than your max_tokens param. If max_tokens < budget_tokens the output stream hangs. Try upping it to be higher than the mapped budget tokens value. I haven't looked in detail into a mechanism to surface the error messages to the user. |
@mikeyobrien I'll try that, but I'd say this is still not ready for merge as an example in this state. |
@mikeyobrien I'm confused, in the model settings or controls that parameter is 128 by default, which actually doesn't make sense. I always thought that those settings made 0 sense in open-webui tbh, what's set in the model advanced settings is not reflected in the controls during chat, and it's just such a mess... I would be in favor of not setting this with reasoning effort and going back to a valve. As it is with the state of those parameters in open-webui this is not usable unfortunately... :( |
@roryeckel This is context size, I believe mike was talking about max tokens: Or is it so confusing that the names don't make sense? idk |
Oops, that was my confusion. However still the same story about the hard-coding. About your question from earlier, I think advancedparams is a sort of "temporary override" for the defaults that come from the model. You can pretty much ignore the defaults that advanacedparams tells you because it is "truly" coming from the model Btw, "low" effort is way more than 128 tokens so I don't think it's num_predict, I think it's Context Length that needs to be changed to fix the freezing issue though. |
This is the issue I see in logs:
I'm opposed to reverting back to the global valve setting since there is no longer a mechanism to turn off thinking per-chat. Ideally the error is surfaced to the user. |
I am also opposed to making the value global. It needs to be settable per-conversation at least. This is not something where we must wait for an error. We can read the settings and see if there is a contradiction. Either
I am for solution #1. We can even let users turn it on/off with a valve. |
Agreed, the default values in the tooltips should be removed entirely. |
Yeah I understand that it would be nice to be able to turn on/off or change how much thinking claude 3.7 is doing in the chat. I would agree with solution 1 as well. |
Alright, made the following changes:
|
Looks good! :) |
Nice! |
print(f"Full data: {data}") | ||
else: | ||
raise Exception(f"Error: {response.status_code} - {response.text}") | ||
"""Used for title and tag generation""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong function? I think this should be on get_completion ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for my one nit:
Works great! (I ported the code over to the function to test it)
So FWIW, "approving of these changes".
If anyone needs these changes for the pipe. Code is: Diff is: |
): | ||
try: | ||
budget_tokens = int(reasoning_effort) | ||
except ValueError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except ValueError as e: | |
except (ValueError, TypeError) as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason for that is that reasoning_effort can be of NoneType and then this fails with:
ERROR [open_webui.functions] Error: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
Could also do a check for reasoning_effort to not be NoneType. (No idea why body.get() in that case would not do the default value)
I also got that defaults are not applied properly. If I set the model to default, reasoning_effort is None - I think because default means to use whatever the pipe considers as default. But the default says "medium", so it's a little bit confusing. I found the discussion above. My take is we can distinguish between "None" (as in NoneType) and "none" as in the string none. That all said: This really should be a button to "Think" or not "Think" in OpenWeb UI - like in Grok. So it's kind of an interesting case. |
I opened a discussion here to add a "Think" button: |
Now with async and logging (which might be helpful here as well as it makes it super easy to see what is going on): https://gist.github.com/LionsAd/ed81504e2663dcf33a3d2efc2f9a31f4 |
None of this handles the claude with AWS Bedrock right..? |
"None of this handles the claude with AWS Bedrock right..?" In theory it could, because AWS Bedrock just speaks Claude API (for all models), but you need a special signing key via boto3 client to make the request to the URL. So it's always a new token. That said, you can just setup the AWS Bedrock Access Gateway to get an Open AI compatible gateway. Btw. the way they transferred "reasoning_effort" into budget in the gateway is by taking a percentage of the max tokens available for the request. You can find the gateway here: https://github.com/aws-samples/bedrock-access-gateway If you run OpenWeb UI without docker you can run the gateway also in a python "venv":
then start it. |
For AWS Bedrock users, please see my bare minimum implementation based on @mikeyobrien 's pipeline: https://github.com/lentil32/anthropic_manifold_pipeline_aws_bedrock/ |
@LionsAd I just updated the bedrock-access-gateway, after this pull request is merged, do you think I will be able to just use claude-3.7-sonnet thinking capability through openai api inside Open-WebUI? I really wish it will work without any additional pipeline or function... |
@krittaprot Even though it's off-topic, right now I am not seeing the aws-samples/bedrock-access-gateway#117 fixes it. |
Thanks! |
Add support for extended thinking for Claude 3.7 Sonnet
https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking